Methylator - A Comprehensive DNA Cytosine Methylation Analysis Pipeline in Galaxy

Jonas Bucher1 , Masaomi Hatakeyama2,3, Ueli Grossniklaus1,4, Deepak Tanwar1,4


1 Plant Development Genetics, Department of Plant and Microbial Biology, University of Zurich
2 Evolutionary and Ecological Genomics, Department of Evolutionary Biology and Environmental Studies, University of Zurich
3 Functional Genomics Center Zurich, ETH Zurich and University of Zurich
4 URPP Evolution in Action, Department of Plant and Microbial Biology, University of Zurich

jonas.bucher@uzh.ch | deepak.tanwar@evolution.uzh.ch

Introduction

DNA cytosine methylation is the addition of a methyl group to a cytosine in the DNA. It impacts transcription and therefore plays a major role in several vital processes. In mammals, DNA cytosine methylation predominantly occurs in CG sequence contexts. In plants, in addition to the CG context, the CHG and CHH contexts are common as well.


Figure 1: Methylation contexts in mammals and plants.

Various tools have been introduced to facilitate the analysis of DNA cytosine methylation data. Usually, they focus on a small part of the workflow, which still leaves users with a considerable amount of work to evaluate appropriate tools, transform intermediate output, and finally generate publication-ready figures. Additionally, many tools are limited in terms of the input data, only providing support for certain species and/or specific library preparation methods.

Pipeline overview

Here we introduce Methylator, a user-friendly tool for a full DNA cytosine methylation analysis, with an easy-to-use interface, facilitated reproducibility and interactive visualizations of results.


Figure 2: Overview of Methylator.

Galaxy

Galaxy is an open-source, web-based platform for accessible, reproducible, and transparent computational biomedical research. It provides a user-friendly interface for executing complex bioinformatics tools and workflows without requiring programming expertise. Galaxy enables researchers to upload their data, run analyses using a vast array of tools, and share their entire analytical workflow with others. This approach not only facilitates collaboration but also enhances the reproducibility of scientific findings.


Figure 3: Simplified Galaxy Worfklow.

Duplicate removal

Although technical duplicate reads can arise from different sources, most deduplication tools focus on PCR duplicates. Clumpify can deal with the different types of duplication in the sequencing data, such as optical duplicates. As the abundance of the types of duplicates in the data depends on the sequencing technology used, Methylator adapts the duplication removal accordingly to the user input.

Types of duplicates in sequencing data


Figure 3: Types of duplicates that can be present after sequencing. Adapted from biostars.org/p/229842/

Alignment

Bisulfite treatment of DNA results in a decreased sequence complexity, which deteriorates mapping efficiency. Overall, this causes loss of a big proportion of the sequencing data for analysis.

Dirty Harry method

The Dirty Harry method offers an improvement to the mapping rate by remapping the unaligned reads locally. Through that, this method increases the mapping efficiency and retains a considerable amount of cytosine sites, which would otherwise be lost.


Figure 4: Dirty Harry alignment method. Adapted from Wu et al. (2019)

Visualization

For each type of analysis, several outputs are generated and visualized in a stand-alone interactive shiny app. Publication-ready figures are created using recommended colourblind-friendly palettes. Each plot can be customized and downloaded individually by the user.


Figure 6: Dot plot of biological processes enrichment.


Figure 7: Volcano plot of genomic regions enrichment.

Visual customization


Figure 8: Colour customization of figures.

Comparison with other tools

Methylator methylseq ARPEGGIO MethylStar MethylC-analyzer Bycicle
Platform independent

Self-contained

Interface GUI (SUSHI, Galaxy, Shiny), CLI CLI CLI CLI CLI, GUI CLI
Input Data WGBS, RRBS, PBAT, TAPS, ABBS, single-cell WGBS, RRBS WGBS WGBS, PBAT WGBS, RRBS Targeted BS
Single-cell

Bulk

Supported genomes Mammals, Plants (incl. Polyploids) General Polyploids Mammals, Plants Mammals, Plants General
Quality control

Alignment Bismark, Arioc (GPU-based), EAGLE-RC Bismark, bwa-meth Bismark, EAGLE-RC Bismark

Bowtie (Custom)
Deduplication Clumpify Bismark, Picard Bismark Bismark

Custom
Exploratory data analysis PCA, heatmaps, methylation summaries

PCA, heatmaps, methylation summaries

Differential methylation analysis DMRs, DMLs

DMRs and DMGs

Copy number variation analysis CNVkit

Functional analysis GO, Kegg, Reactome, user-defined

Motif analysis Homer

Visualization

Year published In development 2020 2021 2020 2023 2018

      Table 1: Comparison with features of widely used tools.

GUI = graphical user interface, CLI = command line interface
DMR/DML/DMG = Differentially Methylated Regions/Loci/Genes
GO = Gene Ontology

Conclusion

User-friendly, reproducible and sustainable DNA cytosine methylation data analysis pipeline
Support for plants & mammals
Can analyze methylation data from various library preparation methods for bulk sequencing
Interactive visualizations of results

References

Wu, Peng, Yan Gao, Weilong Guo, and Ping Zhu. 2019. “Using Local Alignment to Enhance Single-Cell Bisulfite Sequencing Data Efficiency.” Bioinformatics 35 (September): 3273–78. https://doi.org/10.1093/bioinformatics/btz125.